Health informatics sits at the vibrant intersection of medicine, data science, and technology, transforming how we store, analyze, and utilize health information. This rapidly evolving field empowers clinicians and researchers to uncover patterns in patient data, improve diagnostic accuracy, and personalize treatment plans without getting lost in complex databases. By turning raw medical records into actionable insights, these innovations are reshaping the future of healthcare delivery and population health management.

At Gist.Science, we bridge the gap between cutting-edge research and public understanding by curating the latest preprints from medRxiv specifically within this domain. Our team processes every new submission in this category, providing both accessible plain-language explanations and detailed technical summaries to ensure the science is clear for everyone, from policymakers to curious readers. Below are the latest papers in health informatics, freshly distilled and ready for you to explore.

Beyond Identifier Matching: An Empirical Characterization of Failure Modes in Biomedical Knowledge Graph Integration

This paper empirically demonstrates that relying solely on identifier matching for biomedical knowledge graph integration is insufficient, revealing that while cross-ontology and embedding-based methods increase coverage, they systematically introduce clinically significant failure modes like over-merging and semantic collapse that obscure critical distinctions in downstream applications.

Hu, S., Cheng, H., Gillenwater, L., Manpearl, K., Mandava, A., Wang, Y., Pividori, M., Stranger, B., Krishnan, A., Greene, C., Gao, Y.2026-05-28📄 health informatics

Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies

This paper introduces an Exploratory AI Recommender that leverages explainable AI to generate data-driven recommendations for feature selection, non-linear terms, and interactions, thereby significantly enhancing the predictive performance and interpretability of high-dimensional clinical models like the Cox Proportional Hazards model.

Yan, J., Machlanski, D., Butler, K., Dimitrakopoulos, P., Harrison, E. M., Guthrie, B. M., Tsaftaris, S. A.2026-05-24📄 health informatics

Ambient AI Documentation in Mixed-Language Encounters: A Heuristic Evaluation of Spanish-English and Mandarin-English Conversations

This study evaluates an ambient AI documentation system's performance in mixed-language clinical encounters, finding that while overall transcription error rates are low and language switching is generally detected reliably, significant challenges remain with Mandarin-English code-switching, including high error outliers and frequent deletions at switch points.

Hu, D., Flores, D., Flores, L., Chien, R., Lam, K., Chow, E., Guo, Y., Tam, S., Perret, D., Pandita, D., Zheng, K.2026-05-22📄 health informatics

Evaluating Large Language Models for Translating Multimodal Phenotype Documentations into Executable EHR Phenotyping Algorithms

This study evaluates frontier large language models for translating multimodal clinical phenotype documentation into executable EHR algorithms, finding that while they effectively interpret structured text, their performance significantly degrades with diagram-only inputs, ultimately identifying documentation quality rather than model capability as the primary bottleneck.

Yan, C., Xin, Y., Su, W.-C., Gangireddy, S., Durbhakula, S., Bruehl, S. P., Dickson, A. L., Li, L., Feng, Q., Malin, B. A., Derr, T., Wei, W.-Q.2026-05-22📄 health informatics

Asymmetry between warmth and clinical substance in multilingual consumer health AI

This study reveals that multilingual consumer health AI exhibits a critical asymmetry where clinical substance and safety vary significantly by language—often failing silently in non-English contexts—while maintaining a consistent, empathetic tone across all languages.

Ariel, D., Grumberg, L. R., Supakul, S., Wannasri, S., Mitchnik, I. Y., Lev, A., Ariyamethanon, W., Agbarieh, M., Miari, S., Laban, G., Hasid, B.2026-05-14📄 health informatics

Epidemiology-Informed Graph Neural Networks for Predicting and Interpreting Transmissible Hospital-Acquired Infections: A Retrospective Cohort and Simulation Study

This paper proposes an epidemiology-informed graph neural network (EIGNN) framework that integrates mechanistic epidemiological models with data-driven contact networks to accurately predict and interpret hospital-acquired infection dynamics while ensuring clinical trust through transparency.

Vindas Yassine, Y. E., Bornet, A., Abbas, M., Geissbuehler, D., Rodrigues-Jr, J. F., Teodoro, D.2026-05-12📄 health informatics

Three Decades of FDA Authorizations of AI/ML Enabled Medical Devices: Persistent Specialty Concentration and the Care Delivery Gap (1995 to 2025)

This cross-sectional analysis of 1,430 FDA authorizations from 1995 to 2025 reveals that while AI/ML-enabled medical device approvals have surged exponentially, they remain heavily concentrated in image-rich diagnostic specialties like radiology, leaving significant gaps in representation for other major clinical fields such as pathology, obstetrics, and behavioral health.

Golshani, P., Joseph, M. S.2026-05-12📄 health informatics

Machine Learning and Explainable AI for Multi-State Classification of Malaria Transmission Dynamics in Kenya

This study develops and validates an interpretable machine learning framework using Extreme Gradient Boosting to accurately classify malaria transmission states across Kenya's 47 counties from 2015 to 2025, demonstrating that integrating epidemiological and environmental data can effectively support targeted surveillance and resource allocation.

Gogo, J. A., Wanyonyi, M.2026-05-12📄 health informatics

MISP-Bench: Decomposing User-Provided False Priors into Answer, Rationale, and Guard Effects

The paper introduces MISP-Bench, a large-scale factorial benchmark evaluating how open-weight language models respond to user-provided false priors in clinical and educational contexts, revealing that combined answer-rationale attacks exhibit sub-additive damage, that targeted distractors significantly increase sycophancy compared to arbitrary ones, and that specific safety guard strategies (like source-independence and explicit overrides) effectively mitigate misinformation susceptibility across diverse models.

Jeong, I., Kim, Y., Park, J.-H., Lee, H.2026-05-10📄 health informatics